Download course materials from here.
bit.ly/———–
These packages should already be installed on the lab computers. Please make sure you have installed them on your laptop before tomorrow, if you are using one.
install.packages(tidyverse,ordinal,lme4)
Error in install.packages : object 'lme4' not found
This workshop is designed to introduce you to practical uses and issues in R and RStudio, aimed at linguists and psychologists (well, psycholinguists). In this first day, we will start with the fundamentals of interacting with the programs.
R is a programming language that we can use to tell the computer what to do. We can “speak” R to the computer a number of ways, e.g., through the command line (Terminal in Macs) or through the R app.
Before you can use R and RStudio, it’s important to understand what you’re looking at and where to find things.
R is the program and programming language that allows you to input commands and get the computer to do things. In order to interact with R, some people use a simple R interface, some people use the command line, and some people use RStudio.
RStudio is a GUI (Graphical User Interface) that allows you to interact with R and keep everything organised.
We’ll be using RStudio because it is the easiest to use, with some point-and-click commands, but still with the full functionality and power of R. R runs in the background when you run RStudio, but RStudio takes care of that on its own so all you need to do is open RStudio.
The RStudio interface has (up to) four panes that you can rearrange and customise to suit your needs. Here is the default configuration:
When you open RStudio for the first time, there may only be three panes.
The console is your direct line of communication with R. It operates a bit like a chat window (if you’re familiar with that) because you can type things into the console, hit ENTER, and R will do something (and sometimes, depending on what you type, it’ll respond). You know the console is listening and ready to accept a new command if you see a > on the left edge of the window on the lowest line of text.
The environment is a set of three tabs that effectively lets us see into “R’s brain” (thanks Danielle!). This window lets you see what variables you’ve created, what datasets are loaded in, what packages and libraries are loaded, among many other things. Right now, it’s empty.
If we click on Global Environment, we can see what packages have automatically be loaded into this R session. Once you load other packages in, you will see them here too.
The pane in the lower right has five tabs by default.
Zoom to pop the graph out into a separate window and resize it.Export to save the plot as a PNG, PDF, EPS, etc.How do you keep all your related files organised?
Keeping your files organised will make your life infinitely easier and will help you ease back into using R if you come back to it after a hiatus. The RStudio interface will help with this, but there is no better foundation than good file management.
Danielle Turton and I suggest the following as a best practice:
RStudio offers a neat feature called a project (file type .Rproj). A project keeps all your scripts and datasets handy and can save variables for use later, even if you quit and restart RStudio. This can be useful if you are running complex models, for instance.
To create a new project, click on the small triangle in the upper right corner of the RStudio window.
This will bring up a menu with a number of useful options, but right now, we want to create a New Project…
Since we have already created a folder (i.e., a directory) for our project, we can click on the Existing Directory option. If you haven’t created a folder for your project, you can create a new directory, but in this case you should still follow best practices for file organisation.
From this window, we can browse our computers for the location of the folder we’ve set up.
I have noticed that sometimes students whose computers are set up with a non-English language, particularly with a non-Latin alphabet, can run into problems with setting up these folders. It is important to instruct them to use only Latin characters for the folders that R and RStudio will access, as non-Latin characters use a type of encoding that not all programs can read.
Scripts will appear in the fourth pane (top left by default).
Scripts are simply text files, they are not R and they don’t do anything unless you perform specific actions on them. They’re a bit like instruction manuals, but R can only read them if you manually send the instructions to the console (more on this later).
A script is a way to save your work so you only need to write the code once. Once you’ve written a script once, you can execute it as many times as you like, but you won’t need to write it again. You can copy and paste other people’s code into your script and tweak it to fit your needs. You can debug your code without having to type it in new each time. You can share your code with others, and you can leave comments to yourself in your code so that you can leave it sit for a while and then remember what you were doing when you come back to it.
On all platforms: the green + symbol in the top left corner will let you create a new script.
On a Mac: command+shift+N
On a PC:
The symbol # hides the text that follows on that line from R, that is, it “comments it out”. This lets you write comments to yourself and to anyone else who might read your code. You’ll want to do this so you can remember what you were doing (trust me, you will not remember) and so other people can replicate what you did (even if it’s just to help you debug your code later).
If you type print("Hello World!") (with a nice comment) and then hit ENTER in your console, you will see something like this:
> print("Hello World!") # this line will produce the text between the quotes
[1] "Hello World!"
If you type print("Hello World!") and hit ENTER in a script, you will see something like this:
print("Hello World!")
(Note that nothing happens. There is no output. A script is just a text file.)
Open a new script and type print("Hello World!") into it. How do you get R to execute your code? There are a couple ways:
Run button on the top right portion of the script pane.
Command+EnterCommand+EnterCommand+Shift+EnterWhen you run a script, the text in the script is sent, line by line, to the console. Once in the console, R executes the code. You can watch the code progress in the console. If there is any text or numerical output, it will appear in the console. If there is a graphical output, it will appear in the Plots tab of the lower right pane (Files etc).
Now that we’ve gone through the fundamentals of organisation and basic R operators, we can get into the more important task of reading data and writing output (i.e., saving our work).
Once you’ve created a project (.Rproj), you can open the project and it should open right to where you left off. This means your scripts will open as well, as long as you did not close them the last time you worked on the project. However, if you want to open a script that isn’t currrently open but does exist, you can do so in the Files tab of the lower right pane. Click on the name of the script, and it will open in a new tab of your scripts pane (top left by default).
Its least unique function is to compute arithmetic. You probably don’t want to use R just to add sums, but you could do.
3 + 4
[1] 7
White space doesn’t matter for most things (so you can put spaces between numbers, operations, names).
3+4
[1] 7
Order of operations follows standard rules (BIDMAS / BODMAS / PEMDAS):
3*4^2
[1] 48
(3*4)^2
[1] 144
In the script, enter the following code:
print("Hello world!")
3 + 4
7^2Run the entire script.
Your console should end up looking like this:
> print("Hello world!")
[1] "Hello world!"
> 3 + 4
[1] 7
> 7^2
[1] 49
>
(In the rest of this document, instead of output starting with >, it may start with ##. This is due to how R compiles a script to an HTML document and does not change anything in the script, code, or contentful output.)
[1] mean?The number in square brackets to the left of the output indicates the number of values that have been printed by index number. That is, if there are multiple outputs on one line, [1] will appear. If the seventh item in the output wraps around and appears on the second line, both [1] and [7] will appear. This is useful when trying to make sense of long lists of numbers, for instance.
Below, I saved two longer strings of text into one variable called longText. Then when it’s printed, it appears on two lines. The [1] indicates the first item in the list is the first item on the line and the [2] indicates the second item in the list is the first item that appears on that line.
longText <- c("i don't think we'll be able to fit this text on one line of the output console","so that it might wrap around and display on two lines")
print(longText)
[1] "i don't think we'll be able to fit this text on one line of the output console"
[2] "so that it might wrap around and display on two lines"
One of the most important components of (almost) any programming language is a variable. A variable is an object that can be assigned a value, a list of values, a matrix of values, or something along those lines. A variable in R can be named almost anything with a few exceptions:
A variable name…
. or _ but cannot contain other non-alphanumeric charactersTo assign a value (etc) to a variable, there are three possible operators: =, <-, or occasionally -> (the latter two look like arrows).
First we assign the value of 3 to the variable x. Notice that spaces between the numbers and the operator are optional. I like to use them to help see each component more clearly.
x = 3 # or x=3
x
[1] 3
We can also use one of the arrow operators. The two characters that comprise the arrow cannot have space between them. They act as a single unit. For the leftward-facing arrow, the value(s) on the right are assigned to the variable on the left.
y <- 4 # or y<-4
y
[1] 4
Now that we’ve assigned values to x and y, we can do things to them. Below, I demonstrate some basic arthimetic operators that allow R to act like a calculator.
x + y
[1] 7
x * y
[1] 12
x / y
[1] 0.75
x ^ y
[1] 81
Finally, the rightward-pointing arrow is used very rarely, but functions in a similar way to the other two assignment operators. The value(s) on the left are assigned to the variable on the right.
x + y -> z # or z <- x + y or z=x+y
z
[1] 7
To assign more than one value to a variable, there are several functions we can use, but the most common and easiest is c(). To read more about the function c(), you can type c into the search bar in the help tab in the lower right pane, or you can input the following into your console:
?c
In short, this function combines a series of values into a vector or list.
c(1,2,3)
[1] 1 2 3
c("1","2","3")
[1] "1" "2" "3"
c("one","two","three")
[1] "one" "two" "three"
c(one,two,three)
[1] 1 2 3
Note that bare numbers are green and are output without quotation marks, numbers and words in quotation marks are output as such, but words without quotation marks produce an error. This is because R assumes that all words without quotation marks are variables, but we haven’t created variables named one, two, or three.
one = 1
two = 2
three = 3
c(one,two,three)
[1] 1 2 3
Dataset
sleep:
head(sleep)
What does this do?
sleep[1,]
How does this differ from [1,]?
sleep[2,] # so that means this is row 2
How does this differ from [3,]?
sleep[,3] # what do you think this is?
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Levels: 1 2 3 4 5 6 7 8 9 10
…which makes it dataset[row,column]
You can also navigate with column names:
sleep$ID
[1] 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Levels: 1 2 3 4 5 6 7 8 9 10
How would you view the column extra?
sleep$extra
[1] 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0.0 2.0 1.9 0.8 1.1 0.1 -0.1 4.4 5.5 1.6 4.6 3.4
Use str() to get a summary of the structure of the dataset
str(sleep)
'data.frame': 20 obs. of 3 variables:
$ extra: num 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0 2 ...
$ group: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ ID : Factor w/ 10 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
What are all the unique values in ID?
unique(sleep$extra)
[1] 0.7 -1.6 -0.2 -1.2 -0.1 3.4 3.7 0.8 0.0 2.0 1.9 1.1 0.1 4.4 5.5 1.6 4.6
What’s the value in the first row, third column?
sleep[1,3]
[1] 1
Levels: 1 2 3 4 5 6 7 8 9 10
What’s the first element in the column ID?
sleep[1,]$ID
[1] 1
Levels: 1 2 3 4 5 6 7 8 9 10
sleep$ID[1]
[1] 1
Levels: 1 2 3 4 5 6 7 8 9 10
You can also view the dataset as a spreadsheet (although it can’t be altered).
View(sleep)